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A message transmitting and receiving apparatus comprising: 

A memory, storing keywords associated with said apparatus and degrees of importance of said keywords; 
A detector, detecting an occurrence of a transmitted or received message; 
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Marketing Web sites is a family business 
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Address data management method for online shopping, involves searching 
address ID included in delivery request from specific table for 
extracting corresponding address data to deliver purchased article 

Patent Assignee: FUJITSU LTD (FUIT ) 

Inventor: FUJIMOTO S; FUKUI M; KAKUTA J; KIHARA H; MATSUMOTO Y ; MURAKAMI 

M ; OHNO T; OKADA S 
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Patent Family: 
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US 20030074213 Al 20030417 US 200256089 A 20020128 200341 B 
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Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 
US 20030074213 Al 20 G06F-017/60 
JP 2003123004 A 13 G06F-017/60 

Abstract (Basic) : US 20030074213 Al 

NOVELTY - A correspondence table indicating correspondence between 
address data acquired from a purchaser (100) and address IDs 
established for address data, is managed. A delivery request data 
generated by a vendor (200) is accepted based on a delivery request 
from the purchaser. The address ID in the delivery request is searched 
in the table and corresponding address data is extracted to deliver 
article to the purchaser. 

DETAILED DESCRIPTION - An INDEPENDENT CLAIM is also included for an 
address data management system. 

USE - For managing delivery addresses during online shopping. 

ADVANTAGE - Allows a purchaser wishing to purchase merchandise from 
an online shopping site or other vendor to make a purchase and order 
delivery, while keeping delivery address data secret from third 
parties, including the seller of the merchandise. 

DESCRIPTION OF DRAWING (S) - The figure shows the block diagram of 
the address data management system. 

purchaser (100) 

merchandise vendor (200) 
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US 20020023128 Al 18 G06F-015/16 
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Abstract (Basic) : US 20020023128 Al 

NOVELTY - The system configures virtual communication spaces for 
transmission of messages between terminals in a network. The display 
device of each terminal displays the message along with character 
trains which are used as message sender identification information. A 
table stores the identifiers and corresponding character train in the 
communication spaces. 

DETAILED DESCRIPTION - An INDEPENDENT CLAIM is also included for 
recorded medium storing identifiers for persons using chat system. 
USE - Internet chat system. 

ADVANTAGE - A character string is displayed instead of a message 
sender's identifiers, hence one speaker can be indicated by different 
character strings for different virtual spaces or different character 
strings can be used and displayed for different receivers. A customer 
can avoid addressing a question to other customer by mistake. 
Inadvertent transfer of in-house information into the customer channel 
can be reduced by using different display names for different channels 
which also allows the speakers to express their opinions and listener 
to read the speaker's message. 

DESCRIPTION OF DRAWING (S) - The figure shows the display in the 
display device of the terminal. 
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Message transmitting and receiving device used in chat system, has 
processor that determines importance of keyword stored in memory and 
extracting unit that obtains keyword from received message 
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Abstract (Basic) : JP 2002041431 A 

NOVELTY - The message transmitting and receiving device (121-125) 
has a processor that determines the importance of a keyword stored in a 
memory. An extracting unit obtains the keyword from a received and 
detected message. 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are also included for the 



following: 

(a) the dynamic determination of keyword and its degree of 
importance from received message; 

(b) the message transmission and reception system; 

(c) and the recoding medium storing the program for dynamic 
determination of keyword and its degree of importance. 

USE - Used in chat system. 

ADVANTAGE - Dynamically determines keyword and its degree of 
importance in usual internet relay chat (IRC) system. 

DESCRIPTION OF DRAWING (S) - The figure schematically shows the 
interconnection of IRC client PC and IRC server PC in a network and the 
chat system components. Drawing includes non-English language text. 

Message transmitting and receiving device (121-125) 
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Speech assisting method and device 
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Abstract (Basic) : WO 200041080 Al 

NOVELTY - A speech assisting device is used with a chat client. In 
a condition DB (3); predetermined conditionas of speech to one channel 
and processings of speech correlated to the conditions are stored. An 
acquiring section (7) acquires information about channel from a chat 
client according to conditions and processings. A judging section (4) 
judges whether or not a speech satisfies the conditions based on the 
acquired channel information before the speech is transmitted to a 
channel. An execution section (5) executes processings about the speech 
according to the result of the judgement and conditions and transmits 
the speech to the channel through the chat client. An* example of the 
conditions is that the speech extends over 30 lines. An example of the 
processings is that the user is required to confirm the contents of the 
user's speech. 

USE - None given, 
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Communication management system in chat system - has connection 
transmitter to transmit acquired utterance to communication unit 
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JP 11272577 A 17 G06F-013/00 

US 6393461 Bl G06F-015/16 



Abstract (Basic) : JP 11272577 A 

NOVELTY - An utterance acquisition unit (111) acquires the 
utterance which adapts the predetermined condition from information 
obtained from monitoring unit. Connection transmitter transmits the 
utterance which perform the acquisition to communication unit (102) . 

USE - For computer network connected with chat system. 

ADVANTAGE - Offers communication management system in a chat system 
which makes communication establishment realizable. DESCRIPTION OF 
DRAWING (S) - The figure shows the functional block diagram of 
communication management system in chat system. (102) Communication 
unit; (111) Utterance acquisition unit. 

Dwg. 1/16 

Title Terms: COMMUNICATE; MANAGEMENT; SYSTEM; SYSTEM; CONNECT; TRANSMIT; 

TRANSMIT; ACQUIRE; COMMUNICATE; UNIT 
Derwent Class: T01 

International Patent Class (Main) : G06F-013/00; G06F-015/16 
File Segment: EPI 



8/5/6 (Item 6 from file: 350) 

DIALOG (R) File 350: Derwent WPIX 

(c) 2004 Thomson Derwent. All rts. reserv. 

012769617 **Image available** 

WPI Acc No: 1999-575840/199949 

Related WPI Acc No: 1999-623924 

XRPX Acc No: N99-424999 

Utterance log management system for computer network used in chat system 
- has controller which transmits utterance log stored in each preserving 
circuit to facsimile machine via computer network, when demand from 
authenticated user is received 
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Abstract (Basic) : JP 11249990 A 

NOVELTY - A user authentication log processor (170) verifies the 
authenticity of the user based on authentication information from a 
transmitting unit. A controller transmits the utterance log stored in 
each preserving circuit (151a-151c) to a facsimile machine (330) via 
computer network, when a demand from an authenticated user is received. 
DETAILED DESCRIPTION - The utterance log is acquired from a 
predetermined chat server irrespective of whether the client terminals 
(110A-130A) are connected to the chat system. 

USE - For computer network used in chat system. 

ADVANTAGE - Prevents illegal user from participating in the chat 
system since user authentication is performed. DESCRIPTION OF 
DRAWING (S) - The figure shows the theoretical block diagram of the 
utterance log management system. (110A-130A) Client terminals; 
(151a-151c) Preserving circuit; (170) User authentication log 
processor; (330) Facsimile machine. 
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Background image display control system for e.g. chat system in computer 
network - has display control unit that manages background image display 
corresponding to conditions stored in table storing unit when 
predetermined condition corresponds to stored conditions 

Patent Assignee: FUJITSU LTD (FUIT ) 

Inventor: MATSUMOTO Y ; MURAKAMI M ; OKADA S 

Number of Countries: 002 Number of Patents: 002 
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Abstract (Basic): JP 11191765 A 

NOVELTY - A display control unit manages the background image 
display corresponding to the conditions stored in a table storing unit 
when a predetermined condition corresponds to the stored conditions. 
DETAILED DESCRIPTION - The system has a server (101) that controls the 
character data broadcast between computers (110,120,130,210,310) 
connected to a local area network (100,200,300). A table storing unit 
stores the defined conditions which are expressed within a chat system. 
An INDEPENDENT CLAIM is also included for a recording medium that 



records the background image display control program. 
USE - For e.g. chat system in computer network. 

ADVANTAGE - Ensures suitable background image display of character 
row when chatting with other computers. DESCRIPTION OF DRAWING (S) - The 
figure shows the block diagram of a computer network. (100,200,300) 
Local area network; (101) Server; (110,120,130,210,310) Computers. 
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Chat system of client and server computers - has client computers which 
display name, attribute information and frequency of utterance of 
registered channel, when connection to chat server is performed 
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Abstract (Basic): JP 11184786 A 

NOVELTY - Each client computer (3-5) displays the name, attribute 
information and frequency of utterance of the registered channel, when 
a connection to a chat server (1) is performed. The chat server 
switches the communication channel for chatting, based on designated 
attribute information and frequency of utterance received from each 
client . 

USE - None given. 

ADVANTAGE - Simplifies distinction of attribute information and 
frequency of utterance information in each channel since channel is 
switched based on received designation information, prevents 
participation of illegal user since verification and approval of 
registration are performed. DESCRIPTION OF DRAWING (S) - The figure 
shows the block diagram of the chat system of client and chat server. 
(1) Chat server; (3-5) Client computers. 
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Service management procedure for chat system - involves specifying script 
file and managing implementation of script file so that implementation of 
service is managed 
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Abstract (Basic): JP 11184681 A 

NOVELTY - A script file is specified and the implementation of the 
script file is managed so that the implementation of a service is 
managed. A designated service is searched from a service table and a 
script file after the designation of the service which is going to be 
implemented is received. DETAILED DESCRIPTION - INDEPENDENT CLAIMS are 
also included for the following: a client; a service management 
apparatus; a recording medium; and a chat system. 
USE - For chat system. 

ADVANTAGE - Enables reducing the burden of a user who performs 
service transmission, amendment or deletion in an apparatus, computer 
or client. DESCRIPTION OF DRAWING (S) - The figure shows the block 
diagram of the chat system. 
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Abstract (Basic) : US 5471610 A 

The system includes character string storage (105) storing a piece 
of text. A filtering device (3000) fetches character codes from the 
text read out from the string storage device to output only those 
character codes that are included in the search terms. A character 
string matching device (102) matches, en bloc, a string of character 
codes outputted from the filtering device. The search terms decide 
whether or not the search terms exist in the string of character codes 
outputted from the filtering device. 

A synchronizing device between the filtering device and the 
character string matching device buffers differences in processing 
speed while transferring data from the filtering device to the 
character string matching device. 

ADVANTAGE - Gives high speed matching throughput without use of 
high speed memory. Provides fast and inexpensive text search. 
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Abstract (Basic) : WO 9016036 A 

The document information retrieval method of effecting full text 
search has an apparatus with a magnetic disc device. Two-step presearch 
of documents is effected with respect to a key - word for the 
retrieval. In the first step of the presearch, a character table 
describing, by documents, the presence or absence of all the character 
codes included in a group of text data of the documents stored is 
generated in advance. The character table is searched using all 
character codes that constitute the keyword, and only the documents 
including the character codes are picked up. 

In the second step, compressed text data excluding annexed words 
contained in the text data and repetetively appearing words are 
generated, and documents containing the keyword as a word are picked up 
out of the documents picked up in the first step. After the second step 
(step 403), a text search (step 404) is effected according to proximity 
condition, context condition, etc. 
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ABSTRACT 



PROBLEM TO BE SOLVED: To realize the display of state information 
corresponding to the situation of a user while suppressing the operation 
load of the user. 



SOLUTION: An 'action rule 1 for changing how to display the state 
information of a body and an 'application situation 1 indicating a situation 
where the action rule is applied are stored. For example, in accordance 
with the rule information of action ID f 2', a body list ? be at work 1 is 
displayed in hours from 9 to 17 o'clock. Thus, the body list corresponding 
to the situation of the user is automatically displayed. In accordance with 
the rule information of the action ID f 2', a filter 'working' is used in 
hours from 8 to 17 o'clock. This filter regulates the non-display of the 
update notice of the state information to be noticed from any communication 
address other than '*@ fujitsu.com'. Thus, it is possible to prevent the 
display of the update notice of the state information from any person other 
than colleagues in the business hours. 
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ABSTRACT 



PROBLEM TO BE SOLVED: To smoothly perform communication by recognizing the 
time at which a communicating party easily performs reception and a 
communication kind, etc., prior to communication execution at the time of 
transmission to the communicating party in communication terminal 
equipment . 

SOLUTION: On the basis of communication history information including items 
such as communication date and time, the communication kind and 
presence/absence of a response from an opposite party, indicating a history 
of communication performed in the past with the communicating party stored 
in the communication terminal equipment such as a portable telephone, from 
the number of times of the communication in each prescribed time band and 
the number of times of succeeding in the communication, a response rate in 
each prescribed time band is analyzed-. A time band in which the response 
rate is high and the response rate of the time band including the present 
time or the communication kind of the high response rate, etc., are 
presented to a user as recommended communication information. 
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ABSTRACT 



PROBLEM TO BE SOLVED: To provide a portable terminal that attains the 
compatibility between the user-friendliness and the security. 

SOLUTION: A user stores its past behavior pattern into a user behavior 
management table 14. When the user selects a function of the portable 
terminal at a place and instructs its execution, the user discriminates 
whether or not the function has been used before around the place on the 
basis of the table 14. If the function has been used, the selected function 
is executed. If not, the portable terminal requests the user to enter a 
password. When the user enters the genuine password, the selected function 
is executed and a new behavior pattern is added to the table 14. Thus, when 
the user is going to execute the same function at the same place, the user 
needs not enter the password. 
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ABSTRACT 

PROBLEM TO BE SOLVED: To provide easy-to-input authentication information 
which is used to receive services from a service provider, etc., and 
corresponds to equipment. 

SOLUTION: Profile information on the equipment which receives the services 
is sent to the service provider side and according to the profile 
information (terminal type, key array, etc.), authentication information is 
generated which consists of characters and symbols easily inputted through 
the equipment to facilitate the operation for inputting authentication 
information. 
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A] 



ABSTRACT 



PROBLEM TO BE SOLVED: To make speeches to be noticed easily identifiable in 
a chat system for making conversations with character information by using 
a plurality of chat terminal equipment. 

SOLUTION: Information such as the number of times of conversations or 
conversations including keywords is extracted, and speakers to be noticed 
are ranked based on the chat history information, and display colors or 
fonts at the time of display or voices are set corresponding to this rank 
so that speeches to be noticed can be applied with more easily identifiable 
configurations . 
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ABSTRACT 

PROBLEM TO BE SOLVED: To display the state of conversation in a channel. 

SOLUTION: The device visually displays the state of conversation in a 
virtual space by using a graph and a chart. For example, the device 
generates a log to record a message in the conversation and a time of the 
message being spoken and also calculates a predetermined analytical item on 
the basis of the logs. The analytical items are a number of times for speak 
out, speak out frequencies out, the number of participants, and key 

words . The device is supposed to display the state of conversation in the 
virtual space 1) in a way to show values of the analytical items on the 
graph where different items are positioned on a vertical axis and a 
horizontal axis, 2) in the way to show the value of the analytical items on 
the graph where the analytical items are on the vertical axis and the times 
are on the horizontal axis, and 3) in the way to show the states of 
participated channels with a polygon by changing a size, a figure, and a 
color of the polygon in response to the state of the conversation. 
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ABSTRACT 



PROBLEM TO BE SOLVED: To enhance convenience of a user by providing a 
substitute device with a dictionary in which conversion rules of a 
character message are described by relating it with communication device, 
notifying an instruction whether the character message to be transmitted 
and received via the substitute device is converted or not from the 
communication device to a substitute terminal and converting the character 
message according to the instruction. 

SOLUTION: An agent terminal is provided with a communication part on the 
server side, a communication part on the client side, a converting part 1, 
a dictionary managing part 2 and a dictionary data base 3. A conversion 
mode and a dictionary mode are set by the converting part 1. The character 



message to be transmitted from the converting part 1 is converted, based on 
the dictionary of the dictionary DB 3 by the dictionary managing part 2. In 
addition, the dictionary of the dictionary DB 3 is updated, based on the 
instruction from the converting part 1 by the dictionary managing part 2. 
The dictionaries to be used for conversion and to be updated are the ones 
which are made correspond to a user terminal. Plural dictionaries are 
normally stored in the dictionary DB 3 and each dictionary is stored by 
making it correspond to the user terminal. 
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ABSTRACT 



PROBLEM TO BE SOLVED: To minimize a display area on a display device by 
displaying a message transmitted and received in plural networks time 
sequentially, independently of a message of each network. 

SOLUTION: A user terminal 10 is provided with a chat system 30 for 
transmitting and receiving a message to/from another user terminal 10 
simultaneously and also in real time. When a message receiving part 33 
receives talking from the other user terminal 10, it notifies the received 
talking to a message displaying part 20 through a message notifying part 
32. The notification message receiving part 22 of the part 20 shows the 
message on a display device 11 through a display processing part 23. The 
part 23 processes the received message so as to display the message 
independently of the chat system 30. Thus, talking through all connected 
channels is shown in the same display area time sequentially. 
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ABSTRACT 

PURPOSE: To provide filing technique capable of improving operability in 
retrieval by performing the attachment of retrieval information 
scrupulously when image data is accumulated without increasing a load or 
required time on a host device. 

CONSTITUTION: This device is the filing device constituted of a code/image 
conversion part 107 which converts code data arriving from a host 101 via a 
LAN 118 and a LAW adaptor 104 to the image data, an optical disk 106 in 
which obtained image data is accumulated, and a key word generating 
adaptor 108 which automatically extracts the key word of title 

information, etc., from the code data, and in which the key word 
extracted from the code data in unit of one piece of image data generated 
from the code data is stored in the optical disk 106 after it is attached 
as the retrieval information. 
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ABSTRACT 

PURPOSE: To make index recovery efficient by providing a picture data 
storing part with a picture associated information control part. 

CONSTITUTION: The picture associated information control part B is provided 
at the head of the picture data storing part 1, and the addresses of 
picture associated information a(sub 1), a(sub 2)... are recorded in it. At 
the time of index recovery processing, at first, the picture associated 
information control part B is read, and the first picture associated 
information a (sub 1) is read, and an attribute necessitated for generating 
an index is extracted from the picture associated information a (sub 1), and 
is turned into a key word for retrieval. Next, the address in which the 
next picture associated information a (sub 2)... is recorded is recognized 
from the picture associated information control part B, and the picture 
associated information a (sub 2)... is read. The above-mentioned processing 
is repeated. Thus, the read of the free area of a picture data part and the 
discrimination between the free area and a picture associated area can be 
omitted, and the speed of the index recovery processing can be improved. 
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Communication assistance procedure for internet relay chat involves 
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matched with virtual space and notifying characteristic to user 
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Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 
JP 2001147880 A 13 G06F-013/00 

Abstract (Basic) : JP 2001147880 A 

NOVELTY - A keyword and a category are matched. The category of 
the keyword is specified when the keyword is included in a message 
transmitted and received in virtual space, A message is matched with 
the category and the keyword and is stored afterwards. The 
characteristic of the virtual space is computed based on the category 
of the keyword matched with the virtual space and notified to a user. 

DETAILED DESCRIPTION - An INDEPENDENT CLAIM is also included for a 
communication support system. 

USE - For internet relay chat ( IRC ) . 

ADVANTAGE - Simplifies selection of virtual space in conversation 
system. Allows notification of characteristic reflecting real time 
content of conversation to user since characteristic of virtual space 
is computed based on content of conversation itself even when content 
of conversation changes . 

DESCRIPTION OF DRAWING (S) - The figure shows the entire block 
diagram of a communication support system. 

pp; 13 DwgNo 1/8 

Title Terms: COMMUNICATE; ASSIST; PROCEDURE; RELAY; COMPUTATION; 

CHARACTERISTIC; VIRTUAL; SPACE; BASED; CATEGORY; KEYWORD ; MATCH; 

VIRTUAL; SPACE; NOTIFICATION; CHARACTERISTIC; USER 
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Abstract (Basic) : JP 11015854 A 

NOVELTY - Knowledge data in a formed object area, in which an 
extracted document exists, are formed based on an extracted important 

keyword , and are stored in a document data memory (16) . The 
important keyword about the document is extracted within the 
formation time of the document existing in the object area. DETAILED 
DESCRIPTION - A keyword extractor (12) computes and extracts an 
important keyword from a passed document. The data, containing the 
keyword formation time information of the passed document, are stored 
in a document data and keyword ensemble memory (13) which is accessed 
based on the object area set by inputting arbitrary time on a time 
axis. INDEPENDENT CLAIMS are included for the following: a document 
processing apparatus; and a recording medium storing the document 
processing method. 

USE - For extracting specific keyword from a passed document and 
for forming knowledge data serving as evaluation reference in 
describing the document. 

ADVANTAGE - Tracks, alters and accurately determines knowledge data 
even if the user's interest object varies. Detects how a user's 
contrary for interest is changed . DESCRIPTION OF DRAWING (S) - The 
figure shows the block diagram of the document processing apparatus . 
(12) keyword extractor; (13) document data and keyword ensemble memory; 
(16) document data memory . 
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Priority Applications (No Type Date) : US 91723229 A 19910628 
Cited Patents: No-SR.Pub; 4.Jnl.Ref; JP 2044467; JP 2297189 
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Abstract (Basic) : EP 520488 A 

The database system provides access to text based articles via 
retrieval requests. The system includes a database (20) of articles, a 
lexicon (22) and its index (24), a citation/ phrase index (26) and a 
concordance index (28). The database is organised as a temporal data 
base so that versions of the elements for given times is maintained. 

Lemmas and phrases can be added to the lexicon at any point 
without reloading the lexicon. Each addition is marked with a time 
stamp. Subsets of the additions are selected and processed as a 
background task to update the other indices. 

ADVANTAGE - Provides dynamically updatable lexicon without 
reloading whole system. 
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ABSTRACT 

PROBLEM TO BE SOLVED: To provide a more flexible and efficient retrieving 
device with which the concept of similarity in similarity retrieval is 
shown easy to comprehend, the attributes of retrieval conditions are 
flexibly changed and expanded, the periodicity or the like of a 
retrieval process is detected and auseless retrieval procedures are 
excluded. 

SOLUTION: Stored document data in a data base(DB) part 1 are designated by 
a document data designating part 4, and an additional keyword is detected 
by a retrieving person designating document keyword detecting part 6. A 
similarity retrieval keyword determining part 7 generates this detected 
keyword , the similarity retrieval keyword of a retrieval history 
managing part 8 and keyword for similarity retrieval from the history of 
retrieval, a similar document data retrieval part 10 calculates the degree 
of matching with document data, a similar document data display position 
calculating part 12 calculates the display position of document data from 
this degree of matching, and these data are displayed on a document data 
display part 3. When the retrieval is repetition having periodicity , a 
fractal dimension is calculated from the history of retrieved document 
numbers, periodicity is discriminated and retrieval attributes and 
conditions are dynamically changed . 
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ABSTRACT 

PURPOSE: To shorten retrieval working time by performing retrieval by 
automatically generating an associative key word that is an appropriate 
key word for retrieval on which information is complemented at the key 
word for retrieval. 



CONSTITUTION: A thesaurus link generating part 3 generates a dynamic 
thesaurus in which a link generated by using document information is 
attached on a thesaurus 2, and a key word input part 4 for retrieval 
accepts the input of the key word for retrieval from a user. An 

associative key word generating part 5 generates a node coupled with 

the link one after another from the node corresponding the the key word 
for retrieval inputted by the user by using the dynamic thesaurus as the 
associative key word complementing the optimum information to the key 
word for retrieval. In such a way, it is possible to dispense with work 
to change the key word for retrieval and to repeat the retrieval, 
which shortens the retrieval working time . 
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Detailed Description 

Claims 

Fulltext Word Count: 4 64 8 
English Abstract 

This invention provides a system and method of creating temporal clusters 
of content included in a chat session and representing such clusters in a 
chat timeline. The chat timeline provides a visul summary of the content 
of the chat session. The system further incldues additional user-friendly 
features that allow a user to view statistics of particular clusters, 
patterns of the chat session, and other relevant chat information. 

French Abstract 

L' invention concerne un systeme et un procede pour la creation de groupes 
temporaires de contenu propres a une conversation en ligne, et pour la 
visualisation de ces groupes dans un historique du deroulement de la 
conversation qui donne un apercu visuel du contenu de la conversation. Le 
systeme offre egalement des fonctions conviviales qui permettent a 
1 1 utilisateur de passer en revue les statistiques correspondant a des 
groupes de contenu particuliers , les orientations de la conversation et 
d f autres informations pertinentes sur le contenu de la conversation. 
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Detailed Description 

private chat session 
for further discussion on the content of a current chat session; 

(2) updating chatters on topic changes and further including a 
feature to notify a chatter who is participating in a topic-based 
discussion when the 

chatter changes the topic; 

(3) construction of a cluster search engine that allows a user to search 
a 

chat log by topic or keyword ; 

(4) detection of regular time -based patterns of discussion in a chat 
room. 

A time -based pattern of a chat room may include discussing a 
specific 

topic at a certain time on a particular day of the week. Thus, for 
example, 

an embodiment of this f eature ... different "confidence ratings," depending 
on which mapping is more likely to happen. For example, a chat room 
with a computer theme may assign the highest rating to the mapping of 
"Java 4 programmingjanguages" ; whereas a chat room with a Southeast 
Asian-theme may give "Java 4 geography" the highest rating. A dictionary 
editor is provided for the chat room administrator to add/delete 
topics, re-arrange the topic hierarchy, add/delete keywords , and edit 
confidence rates . 

Once the topic list for an utterance is generated, the system divides the 
utterance into. . . 
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Claims 
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English Abstract 

A method, system, and computer program product for facilitating 
collaboration over a computer network. A central database is established 
based on keywords or concepts descriptive of areas of interest to network 
user, and descriptive of resources known to network users, and the 
database can be automatically updated after each use or on a periodic 
basis. The database can be accessed by network users to identify 
resources known to others, thereby improving efficiency. 

French Abstract 

L f invention concerne un procede, un systeme et un programme inf ormatique 
destines a faciliter la collaboration sur un reseau inf ormatique . Une 
base de donnees centrale est etablie sur la base de mots-cles ou de 
concepts representant des zones d ! interet pour un utilisateur de reseau 
ainsi que des ressources connues des utilisateurs de reseau, ladite base 
de donnees pouvant etre automatiquement mise a jour apres chaque 
utilisation ou sur une base periodique. Les utilisateurs de reseau 
peuvent acceder a cette base de donnees pour identifier des ressources 
connues d f autres utilisateurs, d ! ou une efficacite amelioree. 
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Detailed Description 
inf orination. 

In step 206, an information update is performed. This step can be 
performed each time an individual users modifies information in his or 
her personal profile, 5 submits a new electronic document, or otherwise 
modifies inf ori-nation stored in the network. To ensure that periodic 
modification occurs, this step can be executed by software instructions 
which automatically perforin scanning to extract keywords from any 
electronic documents not previously scanned (or any message board or 
chat room postings) , automatically send electronic mail messages to 
persons identified in the database (or to all network users) requesting 
updated information, or other suitable update activities. Such an 
automatic update can be performed periodically at a frequency 
determined by a network administrator, or at a frequency specified by an 
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1 Introduction 



In recent years many technological advances have been made in speech recog- 
nition. Speech-driven word-processors are becoming increasingly commercially 
viable as very large vocabulary, speaker-independent continuous speech recog- 
nition systems continue to improve. Such systems are a far cry from the orig- 
inal limited-vocabulary, speaker- dependent isolated word recognisers, but their 
inherent complexity and excessive amount of necessary training make them un- 
suitable for certain applications. 

Take for example, the case of trying to route telephone calls automatically 
to the correct department in a large store. Isolated word recognition would 
provide a computationally inexpensive and quick way of distinguishing between 
"books" and "toys", but is unfortunately rather too restrictive on what it al- 
lows the speaker to say as it forces them to talk in an un-natural way using only 
pre-designated words. 

Using a continuous speech recogniser would certainly overcome this prob- 
lem, but would introduce new difficulties. The system would have to identify 
every word in the utterance, then perform syntactic and semantic analysis in 
an attempt to extract the meaning of the request from the utterance. Such a 
procedure would be computationally expensive, rather slow and, as it turns out, 
largely unnecessary. A compromise between the two systems is needed . 

Most of the calls to the system would be of the form, "Can you put me 
through to the toys section please", or "I want to speak to someone in the 
books department". By simply looking for occurrences of the words "books" 
or "toys" within the speech, a simple and fast yet syntactically unrestrictive 
system can be built with no loss of performance. 

This method of speech analysis is called keyword-spotting and will be dis- 
cussed in detail in this essay. Initially a definition of keyword-spotting will be 
given along with a description of several word-spotting applications. A discus- 
sion of the evaluation and implementation of different types of keyword-spotters 
will then be given followed by conclusions about the word-spotting methods 
mentioned. 
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2 What is Keyword Spotting? 



2.1 A Definition of Keyword Spotting 

Keyword spotting is the task of identifying the occurrences of certain desired 
keyxuords in an arbitrary speech signal. Word-spotters overcome the syntac- 
tic restrictions of isolated word recognition by making no assumptions about 
the overall speech whilst exploiting significantly less computationally complex 
systems than continuous speech recognisers, since no attempt is made to un- 
derstand the whole speech signal. This allows a significant increase in speed of 
operation and reduction in the amount of necessary training. 

Keyword-spotting therefore involves picking out salient information from a 
speech signal by locating a relatively small number of keywords embedded in 
some arbitrary conversation which may contain a theoretically infinite set of 
words and non-word noise. The word-spotter makes no assumptions about the 
nature of the non-keyword speech, or the syntax of the utterance. This allows 
truly natural conversational speech to be used, which may include hesitations, 
coughing, false-starts and other phenomena not normally modelled in continu- 
ous speech recognition. 

The identification of the keyword usually involves generating a list of puta- 
tive hits which specify the location of the start of the keyword utterance and a 
probability associated with this hypothesis being correct. This is illustrated in 
figure 1. 



2.2 Potential Applications for Keyword Spotting 

Uses for keyword spotting extend far beyond simple telephone routing systems. 
The main potential applications for accurate keyword spotters are summarized 
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Figure 1 : Basic Representation of a keyword-spotter 



below. 
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Figure 2: Some Potential Uses for Keyword Spotting 

• Indexing into Recorded Speech 

Recording speech provides an easier way of storing information than hav- 
ing to write the transcript down. An individual could use a dictaphone to 
record his own thoughts on a problem, or those of others for example in 
a lecture or meeting. For speech to supersede the use of written text as 
a storage mechanism, efficient methods of allowing the user direct access 
to a given point in the recording must be devised. Word-spotting al- 
lows an added dimension to the traditional sequential play-back of audio 
recordings. By locating and subsequently storing the time index of the 
occurrences of certain keywords within the speech, finding these words 
simply reduces to jumping directly to the required index in the recording. 

Since the speaker on the recording is generally known beforehand and 
does not change, a speaker-dependent system could be used. The key- 
words however are generally unknown and therefore a large, variable key- 
word vocabulary is needed. Such a system could also be used to assist a 
speech-based editor in locating keyword instances to allow the deletion, 
substitution and insertion of words. [21] 

• Classification of Speech Messages 

The recent developments in speech messaging and voice-mail systems ne- 
cessitate an ability to summarize speech information to prevent the re- 
cipient having to listen to irrelevant messages. A keyword spotter can 
be used to generate a list of the frequency of occurrence of various pre- 
defined keywords in the speech. This information can then be processed 
by a message- classifier to ascertain the likely topic of the message, thus 
facilitating user review. [20] Such a system needs only a small fixed key- 
word vocabulary if the likely message topics are known in advance, but 
must be speaker independent to allow messages from any speaker. 

• Bit-rate reduction - Filtering Information 

Speech often contains much redundant information. Certain so-called 
"noise" words may appear frequently in the speech without adding much 
to its meaning. For transmission or storage of speech data, when compres- 
sion of the speech is critical, a very large vocabulary keyword-spotter can 
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be used to "filter out" such words to improve compression ratios without 
incurring any significant loss in information content. 

• Response Verification 

Several systems need to perform some verification of a speech response. 
Password recognition could be done by an isolated word recogniser, but 
using a keyword spotter allows more complicated password-phrases to be 
devised and provides more flexibility to the speaker by allowing phenom- 
ena such as coughing or silence. Radio or TV quiz phone-in lines could 
also use a keyword-spotter to automatically identify those calls containing 
the correct answer(s). Such a system must be speaker-independent and 
requires a small but flexible keyword vocabulary. 

• Word-based Commands 

Spotting keywords in real-time can also allow actions to be performed. 
If a speaker-dependent small vocabulary keyword spotter were constantly 
running in the background in a house, then the user could ask for the 
lights to be turned on, or the TV channel changed within a normal speech 
conversation. Another example is running a background keyword-spotter 
on a TV or radio broadcast, so that when an "interesting" event occurs, 
such as the commentator on a football match shouting "goal" , then the 
video will automatically switch on to record the replay. 

• Number Extraction 

Often the desired information in a speech phrase consists of a series of 
numbers such as a phone number or serial-code identification. By running 
a keyword-spotter such information could be extracted without regard for 
the surrounding "padding" of the speech. 

• New- Word Recognition (considering occurrences of non-keywords) 
Speech-driven word-processors will make mistakes if a word is encountered 
which is not in the known vocabulary. By defining all the words in the 
vocabulary as keywords, a word-spotter can be used to detect the occur- 
rence of new words automatically by signaling when no keyword has been 
recognised in the speech. [2] 

These applications are summarized in following table. 



Application 


Speaker 


Small / Large / Huge 


Fixed/ Variable 


Real-time/ 




Dep/Indep 


N° of keywords 


keyword vocab 


Pre-processed 


Indexing Speech 


D or I 


L 


V 


P 


Classifying Messages 


I 


S 


F 


P 


Bit-rate Reduction 


D or I 


H 


F 


P 


Telephone Routing 


I 


S 


F 


R 


Response Verification 


I 


s 


V 


P 


Word-based Commands 


D or I 


s 


F 


R 


Number-extraction 


I 


s 


F 


P 


New-word recognition 


D or I 


H 


F 


P or R 
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2.3 Evaluating Keyword Spotting Performance 

A word-spotter generates a series of putative "hits" from the speech signal, 
which represent possible keyword occurrences. Two types of error may result 
from these hits. Type I error is when a true keyword exemplar in the speech 
is not recognised by the word-spotter. Type II error is when the word-spotter 
generates a false alarm by seemingly recognising a keyword when there was none 
present in the original. There is naturally a trade off between these errors, with 
a reduction in the number of missed keywords occurring at the expense of a 
higher false alarm rate. This trade off is represented by the operating point of 
the system. 

The keyword detection rate Pd is defined as the number of correct putative 
hits from the word-spotter divided by the number of keyword occurrences in the 
original speech sample. The false alarm rate (FAR) is defined as the number of 
false alarms per hour of speech normalised by the number of different keywords 
being considered. The FAR is thus measured in false alarms per keyword per 
hour (FA/KW/HR). By determining the detection rate for different false alarm 
rates a receiver operating curve (ROC) can be drawn. A typical ROC is shown 
in figure 3. 



False alarm rates below 10 FA/KW/HR are of most interest since the system 
would seldom be run above this point and the overall performance tends to a 
steady-state response. The NIST Figure of Merit (FOM) for evaluating keyword 
spotting performance is defined as the average probability of keyword detection 
over the range of 0-10 false alarms per keyword per hour. This FOM proves to 
be relatively independent of the word-spotter's operating point and provides a 
standard to compare different word-spotting systems. 

POM = ^ Caikwihr Pd d(FAR). 



FOM - ii S>. dF 




0 2 4 6 8 io 12 
False Alarm Rate (FA/KW/HR) 



Figure 3: Typical Receiver Operating Curve (ROC) for Keyword Spotting 
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3 Techniques for Implementing Keyword Spot- 
ters 



3.1 A General HMM-based Keyword Spotting System 

Template matching using dynamic time warping offers a possible method of 
modelling speech information. However, HMM-based systems are generally su- 
perior at modelling the acoustic variability which occurs between different ut- 
terances of the same word and offers a more natural extension to modelling 
non-keyword speech through maximum-likelihood training. 



speech 


pre-processing and 
feature extraction 


features 


HMM-based 
word spotter 
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secondary 
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modified 
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Figure 4: Overall Architecture for an HMM-based Keyword Spotter 

Figure 4 shows the overall architecture of a keyword-spotter. The same 
principles would apply if the central HMM-based word-spotter were replaced by 
any fundamental speech recognition technique, such as template-based dynamic 
time warping connected speech recognition or recurrent neural networks, but 
this essay will concentrate on HMM-based systems as they are the most popular 
and successful at present. 

3.2 The Keyword/Filler Word-Spotter Implementation 

A representation of a simple HMM-based keyword classifier is given in figure 5. 




Figure 5: A simple keyword classifier 

Each keyword is modelled as an individual HMM. Since it is often beneficial 
to allow more than one keyword to be defined, the individual keyword-models 
are placed in parallel, thus enabling any keyword to be recognised by the classi- 
fier. To extend this idea to allow sequences of words to be recognised, the start 
and end node of the model (shaded in figure 5) are made into grammar nodes 
and a null transition 1 is added from the right to the left grammar node. This 
can now be considered as a small vocabulary continuous speech recogniser. 



M.e. one which takes no time and produces no output 
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This network can be used to identify the most likely sequence of keywords 
in the speech sample. It could also theoretically be used as a basic keyword- 
spotter. It could certainly distinguish between the keywords, and by thresh- 
olding the likelihood of the keyword occurring, a primitive means of rejecting 
false-alarms, and thus modelling non-keyword speech could be implemented. 
This approach, however, has a major flaw. Absolute values of the likelihood of 
the keyword occurrences are influenced by the noise and channel characteristics 
of the recording. A method of normalising for these effects is required. This 
is implemented by including models which represent non-keyword speech, so a 
comparison of the two can be made before declaring the putative hits. 

In order to model the non-keyword speech explicitly, an alternative path 
through the network is thus incorporated. This is generally called a filler or 
garbage model. Some systems also add extra models for non-speech events, such 
as coughing, laughter or silence. The resulting general HMM-based keyword- 
spotter is illustrated in figure 6. 




keyword 1 
keyword 2 



keyword n 



Non-keyword 
filler model(s) 



Background noise 
or silence model 




Figure 6: A general HMM-based Keyword Spotter 

Sometimes inter-word transition weights are added to the paths through the 
network. [19] These can be used to adjust the operating point of the system by 
penalising certain paths, thus altering the detection probability and false alarm 
rate of the word-spotter. 

The methods of implementation and the resultant performance of this generic 
word-spotter are greatly affected by many different parameters, including 

• Choice and number of keywords. [7, 21] 

• Choice of language model. [7, 18] 

• Choice of feature vector from the speech signal. [17] 
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• Method of modelling keywords. [18] 

• Method of modelling and the nature of non-keyword speech. [18, 19] 

• Training methods and available training data. [15] 

• Choice of scoring system in generating putative hits. [18] 

• Including secondary processing to reduce false alarm rates. [11, 20] 
Each of these will be discussed in turn in the following section. 

3.3 Improving the performance of the HMM-based word- 
spotter 

3.3.1 Choice and Number of Keywords 

Some applications, such as password verification, allow a completely uncon- 
strained choice of keyword. In these cases the keywords can be chosen to max- 
imise word-spotter performance by considering the length and acoustic content 
of both the keyword and non-keyword speech. 

Mono-syllabic words are generally a poor choice for keywords as word-spotter 
performance has been shown to increase with the number of syllables in a given 
keyword. [21] This includes both a higher detection probability generally and a 
lower false alarm rate given a constant probability of detection. This suggests 
phrases rather than single words could increase performance further. Increasing 
the length of a keyword also reduces the likelihood of spurious hits due to the 
keyword being a sub-string of a different word such as "class" and "classifica- 
tion " . 

Choosing keywords which are not phonetically similar to other words in the 
speech will obviously improve performance. "Isocrates" is a good choice of key- 
word [7] as it does not sound like any other English word, whereas "pact" may 
well be mis-identified as "backed" or even "pecked" . 

For information retrieval applications such as the classification of video-mail, 
further constraints on keyword- choice increase the overall system performance 
(although not necessarily the word-spotting FOM). 2 These include the fre- 
quency of occurrence of the chosen keyword and its relevance and exclusivity to 
the topic it represents. 

Both the number arid flexibility of keywords needed in a given application 
affects the method used in implementation and the operating speed of the re- 
sulting word-spotter. 



2 see section 3.3.8 



3.3.2 Choice of Language Model 



Most word-spotters use a null grammar which allows a completely unrestrained 
syntax in that any word can follow any other with equal probability. This is cer- 
tainly more applicable for word-spotting than general continuous speech recog- 
nition, as the comparative rarity of keyword occurrences in most word-spotting 
applications means less grammatical and syntactical information is available to 
help restrict the possible word combinations. 

It has been shown [18], however, that in some very restricted circumstances 
incorporating a statistical language model into the keyword spotter can increase 
performance. In the unlikely event that the available training data has the same 
word-sequence statistics as the target data and a large vocabulary type keyword 
spotter is being used, then statistical language data may be included in the 
system. Rohlicek [18] showed an increase of 6.6% in the FOM using a bigram 
model instead of the traditional null-grammar model used for keyword spotting. 

The simple grammar shown in figure 7 has been shown [7] to improve per- 
formance on voice-activated applications, where the likelihood of the keyword 
being surrounded by momentary silence is quite high. 



Figure 7: The simple grammar model for voice- activated systems 
3.3.3 Choice of Feature Vector 

Before the speech can be analysed it must be converted into a compact represen- 
tation. This is called a feature vector. These features can be frame-based [17, 20] 
or segment-based. [11] Several possible properties of the speech can be used as 
features. Ideally they should be robust, sufficient and complete in that they 
should encapsulate all the information about the speech with the minimum of 
redundancy and in sensitivity to noise. 

Telephone speech is generally sampled at 10kHz and band-limited to 300- 
3300Hz before feature extraction. [11, 17, 19] Mel-spaced samples of a log LPC 
power spectrum have been used as features but were found to produce poorer 
performance than that obtained with Mel-frequency cepstral coefficients and 
their derivatives. 3 [17] 

3 Including derivatives improves the performance of the word-spotter for several different 
feature vectors [17] 
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The distribution of the features can be normalised over a single speaker in an 
attempt to counteract inter-speaker variability, but this has been shown to have 
no effect when cepstral coefficients are used. [17] Energy normalisation however 
is often used to reduce the effect of short-term energy variations and compen- 
sation for channel and speaker variability is performed by a simple subtraction 
on the cepstral vector. [18, 19] 

3.3.4 Method of Modelling Keywords 

The first HMM-based word-spotters used whole-word models to represent the 
keywords. [17, 21] These word models are left-right (or Bakis) HMMs and gener- 
ally have three states per phoneme, although the detailed topology of the model 
has often been altered in an attempt to increase word-spotter performance. [17]. 
The most common representations of a whole-word keyword model are given in 
figure 8, along with an alternative which has been proposed. 




Model A: Simplest LR HMM Model B: Most Commonly Used Model 



tied outputs 

Model C: Explicitly Models Word- Segment Duration 



Figure 8: Possible HMMs for modelling whole- key words 

Model C attempts to model the duration of the phonemes explicitly, rather 
than relying on the internal dynamics of the model. When combined with an 
increase in the number of outpiit distributions, this model has been shown to 
improve a basic keyword-spotter's performance significantly. [17] Model B, how- 
ever, continues to be the most popular. 

To allow for very large keyword vocabularies and keyword variants which 
do not appear in the training data 4 keywords can be represented by sub- word 
models. [19] This allows training information to be shared when building the 
keyword models, but permits the possibility of a sub-word unit being inserted, 

4 e.g. yon may have the keywords "football", "manage" and "manager" in the training 
data, but want to allow the keyword "footballer". 
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deleted or substituted in the correct keyword sub- word sequence. This is espe- 
cially significant if sub- word units are used for the filler model. To compensate 
for this, transition penalties can be added to the network to try to discourage 
jumps between the keyword and filler model. [3] 

Another choice can be made between regarding the output distribution of the 
HMMs. Vector Quantization (VQ) can be used to form a finite set of codebook 
vectors to represent the speech. This is computationally simpler but gener- 
ally does not model the speech as well as the alternative continuous mixture of 
Gaussians. Some Gaussian-mixture based systems use diagonal covariance ma- 
trices [10, 17] or tied states [12] in an attempt to reduce the number of unknown 
parameters to be trained. This increases training speed and system performance 
if limited training data is available. Either system can be used for both whole- 
word and sub-word keyword and filler models, but using both methods in the 
same system is un advisable. 



These points are summarized in the table below. 



Property 


+/- 


Comment 


Whole-word Model 


+ 
+ 
+ 
+ 


Needs no phonetic knowledge of keyword (lexicon) 
Does not allow sub- word insertion/deletion or substitution 
Allows better discrimination against sub-word filler models 
Performs better when few training tokens are available [17] 


Sub-word Model 


+ 
+ 
+ 


Allows very large keyword vocabulary 
Makes it easier to change the keyword vocabulary 
Permits information sharing in training 
Requires a lexicon and transcribed training data 


Mixture Gaussian 


+ 


Models the speech better than using VQ. 
Has many unknown parameters to be trained. 


VQ (discrete) 


+ 


Accuracy lost when forming codebook vectors 
Computationally simpler than Gaussian mixture. 



3.3.5 Method of Modelling and the Nature of Non- keyword Speech 

The performance of a keyword-spotter relies heavily on the ability of the filler 
model to accurately represent arbitrary non-keyword speech. [19] For optimum 
performance, the filler(s) must match the speech significantly more closely than 
the keyword HMM network does if, and only if, the speech is not a keyword. 



Several ideas for implementing such a filler model have been proposed. 
Which implementation to use depends on many factors, including the purpose 
of the keyword-spotting, the a-priori knowledge of the input speech, the amount 
of training data available, the type of modelling used for the keywords and the 
desired operating point of the word-spotter. When evaluating the relative mer- 
its of each implementation, the effect on overall word-spotter performance, the 
system's computational complexity, the required knowledge of the speech and 
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the type and amount of necessary training data should be considered. 

For word-spotting applications with a known and restricted vocabulary in- 
put, it is possible to use a large vocabulary speech recogniser as a word-spotter. 
This effectively uses the parallel combination of all non-keywords as the filler 
model network. By only looking for the occurrences of the keywords, the output 
word-sequence can be converted into a set of putative hits. Such a system could 
be used in message-classification, word-filtering and speech-data compression. 

Using a large vocabulary speech recogniser allows a statistical language 
model to be incorporated in place of the traditional null-grammar of keyword- 
spotters to help improve performance. However, the system results in an ex- 
tremely large network and relies on the non-keyword vocabulary being known 
beforehand. The main principle behind keyword spotting is that the speech does 
not have to be deciphered exactly (a very computationally expensive task), but 
rather that the keywords should be picked out without concern as to the manner 
of the rest of the signal. This means the signal could contain any words, or non- 
keyword sounds such as coughing, silence or music. Running a large- vocabulary 
recogniser as a word-spotter moves away from this principle by assuming that 
the characteristics of the non- keyword speech are known. The system also needs 
a huge amount of initial training to produce all the models for the keywords and 
non-keywords. For these reasons, despite the fact that modelling non-keywords 
in parallel is obviously a good model for non-keyword speech and having all 
the possible words modelled allows the keywords to be changed without further 
retraining, I do not think large-vocabulary recognition has a significant place in 
keyword-spotting. 

For more typical systems, the non-keyword vocabulary is very large 5 and 
the amount of training data available is limited, implying a more general filler or 
garbage model should be used to model non-keyword speech. In general, most 
of the non-keyword utterances in the speech will not have been contained in 
the training set. Some method, therefore, of breaking down the words to allow 
shared models must be found. This can be accomplished by using sub- word 
models to represent non-keyword speech rather than the whole- word models of 
the large vocabulary recogniser. This approach is especially beneficial if the 
keywords are already modelled using sub- word units. 

Rohlicek et al. [17] tried using segments of keywords in parallel as a filler 
model. This had the advantage of not requiring additional training, as the key- 
word model parameters could be re- used. This approach proves useful when 
only keyword speech is available for training, but significantly better perfor- 
mance is obtained if non-keyword speech is used to train the filler models. 



5 indeed, theoretically infinite for an ideal keyword spotter 
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One possible closed set of sub- word models are general- context phones. [20] 
English has around 53 distinct monophones which can model arbitrary speech 
if placed in parallel. This is useful when minimizing recogniser complexity 
is important. Rose and Paul [19] show that a general-context monophone- 
based system performs only slightly less well than a triphone system (60.6% 
to 61.3%) despite its inherent increased recogniser simplicity. Problems with 
co-articulation effects however, often make context-dependent phones (such as 
triphones) more successful filler models. [2] 

Tri/diphone models require more computation that monophone models, as 
they allow more possible sequences through the network, but generally match 
the non-keyword speech better. The highest performance using triphones as 
filler models occurs when no triphones from the keyword (s) appear in the filler 
model, as the chance of misclassification of a keyword as a filler in this case is 
minimized. 

The models discussed above assume some knowledge of the keywords in order 
to train the non-keyword filler model. If all the sample speech, rather than just 
the non-keyword speech, is used to train the filler model then the model becomes 
vocabulary-independent and does not need retraining if a new-keyword is added 
to the system. Indeed Wilcox and Bush's word-spotter required only a single 
utterance of a new keyword to produce the keyword model, thereby allowing 
dynamic specification of keywords at run-time using this approach. [21] The per- 
formance is of course worse than systems where the keywords and non-keywords 
have been totally separated before training, but the increased flexibility of the 
system generally outweighs this disadvantage. The amount of degradation in 
performance is a function of the number of keyword and non-keyword occur- 
rences in the training data and the similarity between the resultant models. 

The optimum size of the filler model also depends on the choice of keyword 
modelling. Generally shorter models in a null-grammar network offer a larger va- 
riety of paths hence increasing the likelihood of matching a non-keyword event. 
However, more computation is needed to calculate these paths, and if sub-word 
units are used in keyword modelling, the probability of spurious filler insertions 
is increased as the size of filler unit decreases. In such a system penalties can be 
included to discourage transitions between the keyword and filler networks. If 
whole- word keyword models are being used, which. impose additional sequential 
constraints on the sub-word units, several genuine keyword occurrences may be 
rejected by a sub- word filler model word-spotter. 

If the word-spotter may need to be reconfigured frequently or no orthograph- 
ically transcribed speech is available for training, then unsupervised clustered 
learning on unlabeled data can be used to produce an alternative filler model. 
Such a system however provides a relatively poor model of non-key word speech, 
resulting in a high FAR and low-performance of the word-spotter. Unsuper- 
vised training will always produce worse performance than that obtained using 
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transcribed data, but remains a possibility when no transcribed training data 
is available. 

In addition to filler models of non-keyword speech, the inclusion of a back- 
ground model can increase word-spotter performance. This is used as a base to 
compare the keyword /filler score and accounts for the variability in factors such 
as channel effects and volume level, when included in the scoring system. 6 

Chigier [5] uses two "sink" models to absorb the non-keyword signal. One 
models a non-speech event, whilst the other models speech which does not con- 
tain keywords. Each model allows for "words" between two and seven segments 
long and is illustrated in figure 9 where M represents a series of 3 acoustic mod- 
els for the non-speech and 15 for the non-keyword speech sink model. These 
acoustic models are not used anywhere else in the system thereby ensuring the 
phone models for the keyword speech are not corrupted. 




Figure 9: A Sink Model for Non- Keyword Events 

3.3.6 Training Methods and Available Training Data 

The amount and type of available training data affects the choice of keyword 
and filler models as discussed in the previous section. If limited training data 
is available, then the performance of the system can be improved by artificially 
transforming the available speech to increase the variability of the training data 
without necessitating additional data collection. [4] 

If labelled training data is available standard Baum- Welch training is gen- 
erally used to determine the parameters of the HMMs. [16] Alternatively fuzzy 
clustering may be used to learn the mean and covariance matrices for a mixture 
of Gaussians if the amount of training data is limited. [21] 

Error- corrective training offers an alternative to the maximum likelihood 
approach and has been shown to improve performance when used in isolated 
phoneme recognition, [15] but is not generally used in training the primary stage 

6 discussed further in section 3.3.7 
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of word-spotters. Tree-based clustering can also be used in training sub- word 
HMMs which use many tied states thus allowing a large number of sub-word 
units to be modelled in a relatively compact way. [8] Vector Quantization is 
generally used if a discrete system is required. 

The best performance is obtained if truly spontaneous conversational speech 
is used for training, rather than the traditionally read speech. [19] This is because 
if the training data includes events such as coughing, silence and false-starts it 
is more likely to be representative of the target speech. 



3.3.7 Choice of Scoring System in Generating Putative Hits 

There are two main methods of generating putative hits as the speech is anal- 
ysed by the word-spotter. The first involves generating a score which represents 
the likelihood of a keyword occurring at a given instant in time. This score can 
then be processed in a number of ways, the simplest being to pass it through 
a simple thresholder, in order to ascertain whether a putative hit has occurred. 
The second method is a Viterbi pass to generate the most likely sequence of 
fillers/key words in the utterance. 

One of the first scoring methods used in word-spotting to help determine 
putative hits was a duration-normalised likelihood function: 

£n(0 = p(zt-d+i, x t \word n) 1/d 

Where d is an estimate of the duration of the word n. This was not very 
successful and was superseded by using an a posteriori probability which gave 
approximately 5 times higher detection probabilities. [17] 

The probability of being in state i at time £, with the observation sequence 
x\, x t given the model M, is given by: 

<*i(t) = P( s t = h xi , .~, x t \M) 

These coefficients can be calculated recursively using the standard forward al- 
gorithm for HMMs. [16] 



"TV— 1 



^2 " i)^' 



bjiXi) 



where a{j represents the transition probability from state i to state j and bj(x t ) 
represents the probability of observing x t given the state j. 



The a posteriori probability of the nth keyword ending at time t is thus 
simply: 

E n (t) = Pr(s t - ...„) = —- ( -—- r - _ ^—^ 
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where the sum is over the states in the network and e n is the final state in the 
nth keyword. By detecting local maxima in E n: and applying a thresholder, 
speculations of the most likely occurrences of the keywords can be made. 

This scoring method exploits only a small amount of the potential infor- 
mation from the network and results in a large number of false alarms being 
generated. This method could be used in applications where false alarms are 
not very important, but processing in real-time is critical. Generally, however, 
this is not the case and some post-processing is carried out to try to improve 
performance by reducing the number of spurious putative hits. 

Once the end points of the keywords have been speculated using the forward 
search, one possible form of post-processing is to use the corresponding back- 
wards coefficients to locate the start of the word and generate a verification score 
to assist in reducing spurious putative hits. [21] The computational efficiency of 
the backward coefficient search is increased by only searching when a keyword 
has been hypothesised. This implicitly assumes that the forward search has a 
high detection probability and the purpose of the post-processor is simply to 
reduce the false alarm rate. 



The verification score used in [21] is composed of a comparison of duration- 
normalised backward probabilities for the keyword model and the filler /background 
model: 

_ L ke v 

^' <e ) ~ Ikey + L back 

where 

L kev (t,t e ) = P(x u ....xtjkeyword)^ = 0 b (t - 1)^*7 

and b is the start state of the keyword. The start time is then determined by 
locating the maximum in S(t,t e ) subject to a constraint on the allowable dura- 
tion of the keyword. 

Alternatively, a combination of the forward and backward scores can be used 
directly to produce a word score representing the probability that keyword k 
ends at time t given all the observations. [18] 

Sn(k ) t)= S^ Gh \ i ^^ k /^\ V end - of — word states s 

E, a (M)0(M) 

Time synchronous Viterbi beamsearch decoding can also be used to generate 
putative hits. [19, 20] The Viterbi algorithm evaluates the most likely state 
sequence through the speech using the well-known formula: 

Vj(t) = m^Vi(t - tyoijbjfa) 

This can be used to ascertain which path between the grammar nodes was 
taken at any given time and hence which keyword (if any) was recognised. 
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This information can be used directly to produce a binary- type output of when 
putative hits have been generated. Alternatively, by storing the probability 
associated with going from the first state of the keyword-model to the last given 
the observations and then performing duration-normalisation, a likelihood of 
the putative hit being correct can be evaluated. This is important if secondary 
processing to reduce the false alarm rate is going to be used subsequently. Rose 
and Paul [19] use a score based on a similar argument: 

JKW = 7j=, ^ 

-t end -L start 



They then modified this score by subtracting the corresponding background 
score 7 to give a more representative likelihood ratio, although it has since been 
shown that better results can be obtained if the ratio rather than the subtrac- 
tion of the log scores is used. [14] 

In order to reduce the delay between the speech entering the keyword-spotter 
and the putative hits being generated, a partial Viterbi backtrace can be used 
to identify any states through which all the active Viterbi paths pass simulta- 
neously. The most likely sequence of keywords/fillers can then be evaluated up 
to that state and the putative hits generated as before. This reduction of delay 
allows the system to operate almost as quickly as the afore-mentioned forward 
approach without restricting the scoring to rely on local maxima. 

The delay will be dependent on how many convergent nodes there are in the 
network. Generally the frequency of occurrence of these nodes can be increased 
by using a beam-search to de- activate Viterbi paths with relatively low proba- 
bilities without significant degradation in performance. A study of the delays 
by Rose and Paul found them to be under 3 seconds in all observed examples. [19] 

In general a Viterbi path produces only a single operating point of detection 
rate versus false alarm rate, but by adding variable transition costs to the overall 
model to penalise certain word sequences, the operating point can be varied and 
indeed the word-spotter performance improved. [3] It has been shown [18, 22] 
that given the same mix of false alarms, both the forward-backward and Viterbi 
state sequence scoring methods give comparable results for word-spotting. The 
forward-backward search, however, offers a small computational advantage over 
the Viterbi search as back-tracking is only required when a putative hit has been 
hypothesised. It has also been shown to run 20% faster than the Viterbi search 
in one experiment [22] although the increase in speed depends on the frequency 
of key word occurrences in the speech. 



obtained by passing through the network without the keyword paths 
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3.3.8 Including Secondary Processing 

Secondary processing is a broad term which includes many different procedures 
for different scoring methods and word-spotting applications. All these methods 
however share the common goal of trying to improve the list of putative hits to 
increase the word-spotter's performance. This generally involves attempting to 
reduce the false alarm rate by removing spurious hits, but can also include try- 
ing to recover any keywords which the word-spotter has missed. A simple form 
of secondary processing using backward verification scores from forward-search 
putative hits has already been mentioned in section 3.3.7. 

Rose [20] uses a word-spotter as an acoustic front end to a speech information 
retrieval system. He incorporates some secondary processing by introducing a 
"message-class corrective keyword detector" between the word-spotter and the 
information-retrieval system, which is designed to compensate for the effect of 
keyword false-alarms on performance. 

Rather than manually classifying all the putative hits as true hits or false 
alarms and using these as target values to train a maximum-likelihood neural- 
network based secondary processor, Rose defines an error criterion to be mini- 
mized and uses unsupervised learning and back-propagation to train the network 
to allow modification of the list of putative hits. This eliminates the need to 
have labelled putative hits, but is application specific in that the maximisation 
criterion relates to the overall message-classification task and not the word- 
spotting output. An improvement in the system performance is obtained, but 
this is clouded by his use of the same data in both training and testing and the 
lack of generalisation possible from the resulting system. 

Gish et al. [11] use labelled putative events from the primary word-spotter 
to train keyword-specific segmental models of variable duration for a secondary 
processor which produces a new score. When combined with the primary word- 
spotting score, the false alarm rate of the overall word-spotter can be reduced 
and hence the FOM increased. 8 

The aim of the segmental approach is to group together adjacent frames with 
similar acoustics in an attempt to allow better discrimination between segments 
than would be possible if only frames were used. Once segmentation has been 
completed, one mixture model representing a true hit and one for false alarms 
are generated for each keyword using the relevant labelled putative hits and 
the Expectation Maximisation algorithm. The resulting keyword models are a 
conglomeration of the appropriate segment mixture model, segment transition 
model and keyword duration model. 

The secondary score is the log likelihood ratio between the probability the 
putative hit matches the true model and the false- alarm model. 

8 Prom 67.5% to 72.0% in this experiment, [ll] 
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Ss=~ 



log 



p(E\truth model) 



p(E\false model) 



These probabilities are calculated using a dynamic programming approach to 
find the most probable path of the putative hit through the segments, taking the 
transition probabilities between segments, the likelihood of each segment and 
the duration model of the keyword into account. The primary and secondary 
scores are then range- normalised and added to produce a final combined score 
according to the formula: 



This is used to re-order the set of putative hits and reduce the false- alarm 
rate of the word-spotter. A more sophisticated combination procedure could 
probably increase the performance of the word-spotter further. Note that the 
improvement in performance from using this secondary processor is highly de- 
pendent on the training data available. If no false alarms are generated for a 
given keyword from the training data, then no secondary model can be produced 
and the secondary processing will not help reject spurious false alarms for that 
keyword. For maximum benefit, therefore, a large amount of training data is 
required which includes many instances of the keyword(s), allowing the primary 
word-spotter to generate several true hits and false alarms. 

3.4 Reducing Run-times Using Lattice-Methods 

The keyword/filler model word-spotters discussed in the previous section per- 
form well if the desired keywords are known in advance. However, specifying 
new keywords necessitates re-running the word-spotter and in some cases fur- 
ther training if no model for the new keyword already exists. 9 This can lead 
to slow performance. For word-spotting applications such as message retrieval 
from a large speech corpus, where many different keywords may be used to try 
to recover the required information, the flexibility in choice of keywords and the 
time taken to locate them is crucial. 

j 

James and Young [13] take a novel approach to this problem by pre-processing 
the speech to form a phone-lattice, which can then be searched at run- time to 
locate the probable occurrences of the keywords. Such a system requires an in- 
creased storage capacity to hold the lattice and uses more time in pre-processing, 
but once processed, the word-spotter has been shown to run at up to 360 times 
faster than real- time for one keyword. [13] This word-spotter also allows the un- 
restricted re-specification of keywords without necessitating further retraining. 



9 For example if using a whole- word based word-spotter such as a large vocabulary system 
where the new keyword is not included in the system vocabulary 




where S 71 = nth percentile of S 



J'SO _ £10 ^ £90 _ £10 
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The phone lattice is a connected loop-free directed graph and consists of 
nodes representing given points in time and edges representing the most proba- 
ble phone hypotheses and their associated likelihood. The recognition network 
consists of all possible phones in parallel and the lattice is generated using a 
modified Viterbi speech recogniser based on the Token Passing Paradigm. The 
degree, N, of the lattice is the maximum number of edges that can begin at any 
node. Choosing the degree for the lattice is a trade-off between the increased 
possibility of missed hypotheses as the degree is reduced against the increased 
search time and storage requirement for larger values of N. An example of a 
phone-lattice of degree 2 is given in figure 10. 




-959.5 -525.1 -1018.647 



Figure 10: Example of a phone-lattice of degree 2 for the word ship 

Recognition is then achieved by searching through the lattice to find the de- 
sired keyword phone-sequence. Phone insertion, deletion and substitutions can 
be allowed in the dynamic programming paths using empirically determined 
penalties for each action, but this increases the required computation time and 
the false alarm rate. Measures to limit this extra computation such as defining 
"strong" phones which must be matched exactly, or only allowing substitutions 
by similar types of phones further increase the amount of information storage 
and pre-processing necessary. James and Young [13] found only a 2% drop in 
FOM for triphone-based experiments when marking all phones as strong (i.e. 
allowing no substitutions, deletions or insertions) with a corresponding 50% in- 
crease in speed. 

An alternative method for lattice building and searching which allows more 
flexible models of keywords has been proposed by Gelin and Wellekens [10]. The 
a posteriori probability of a phone occurring is calculated using the standard 
forward coefficients or a Multi Layer Perceptron. After smoothing by low-pass 
filtering, segments of the speech where this smoothed probability exceeds a cer- 
tain threshold value are detected and denoted X{ — [x ii ... i x s ] as shown in 
figure 11. 

Assuming the acoustic vectors are independent, from Bayes rule a duration- 
independent likelihood of the phone ip given the segment is given by 
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Figure 11: Generating Phone Hypothesis Using Thresholding 

p W x t )- ULtP{xi) - w -t 

For each segment a hypothesis consisting of the phone being considered, this 
associated probability and the start and end times of the segment is added to 
the lattice, L. To account for variability in pronunciation, each phone, 
in the lattice is allowed to be replaced by its q most confusable phones, 
(t=l to q) generating a series of possible sequences, <j> g . Each such sequence 
has an associated sequence confusion probability, P(<^), based on the confusion 
probabilities between phones, P(if>\ip g ). The search stage then simply reduces 
to finding the maximum probability of the occurrence of each sequence, <j> g , 
followed by finding the most likely occurrence of the keyword sequence, <j>: 

P(<j>\L) = max[P(<fi g \L) P(j g )] 

9 

The use of confusion probabilities allows speaker-independence to be achieved 
across regional accents. For example, the acoustical dissimilarity of the Northern 
English /ae/ and the Southern English /a:/ for the letter a, will not degrade 
performance in a confusion-based system as much as in a purely acoustically 
based system. 

4 Conclusions 

Word-spotting has many applications ranging from telephone routing and mes- 
sage classification to new- word identification and speech indexing. This essay 
has discussed the fundamental principles behind word-spotting and described 
some techniques currently used to implement such a system. 

The method of implementation depends on many factors including the nec- 
essary speed and accuracy of the word-spotter, the amount and type of train- 
ing data available, the number of desired keywords, the required flexibility of 
the keyword vocabulary and the place of the word-spotter in an overall sys- 
tem. 10 The main word-spotting implementations are summarized in the follow- 
ing table. 



10 such as a message classifier 
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Property 


Conditions for Most Appropriate Use 


HMM keyword/ 
filler model 


Whole-word 
keyword models 


Small set of pre-defined keywords 

Tl jT • i 1*1 1*1** 1j 

Many instances ot keywords in training data 


Sub-word 
keyword models 


Few instances of keywords in training data 
Possibility of changing keywords exists 


Large Vocab. 
system 


Non-keyword vocabulary known 
Large but fixed keyword vocabulary 


Lattice approaches 


Fast implementation critical 

Keyword specification at run-time necessary 

Required storage and pre-processing not important 




Confusion-based 


Variation across regional accents is known 



A commonly used HMM-based system has been explained in detail and the 
effect of varying several key parameters has been investigated. A null-grammar 
was found to be the most appropriate language model for most word-spotting 
applications. Mel-frequency cepstral coefficients and their derivatives proved 
to be the most successful choice of feature vector. The system performance 
was shown to improve by increasing the number of syllables in keywords. The 
most appropriate choice of keyword and filler model implementation was shown 
to depend on the number and desired flexibility of keywords, the nature and 
knowledge of non-keyword speech and the training data available. Whole-word 
keyword models work well when many instances of the keywords occur in the 
training data and a small fixed keyword vocabulary is used. Sub- word units 
(of which the triphone proved to be the most successful) can be used to allow 
increased variability in choice of keywords and shared training of models from 
limited data, but suffer from permitting spurious sub-word substitutions, dele- 
tions and insertions. ~ 

Forward-backward and Viterbi scoring were discussed and shown to pro- 
duce similar performance, although the forward-backward method offers slight 
computational advantages. Normalisation of the putative score using the back- 
ground model score was incorporated to improve performance. Finally, the 
ability of secondary processing to increase performance by reducing the false 
alarm rate in the putative hits was demonstrated. 

Some alternative lattice-based approaches were offered for applications where 
speed and flexibility of keyword choice is crucial and the ability to model regional 
accent variations was demonstrated. Other possible word-spotter implementa- 
tions, such as using neural networks also exist, [1, 6, 9, 10] but have not been 
detailed in this easay. 

Word-spotting covers a large range of applications and different approaches 
to the problem are appropriate for each set of circumstances. This essay has 
investigated the advantages and disadvantages of various implementations and 
suggested applications for which each might be suitable. 
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